Client Report - Project 0: Introduction

Course DS 250

Author

Sarah Egendoerfer

Show the code
import pandas as pd
import numpy as np
from lets_plot import *
from palmerpenguins import load_penguins


LetsPlot.setup_html(isolated_frame=True)
Show the code
# Learn more about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html

# Include and execute your code here
from palmerpenguins import load_penguins
df = load_penguins()

QUESTION|TASK 1

Include the tables created from PY4DS: CH2 Data Visualization used to create the above chart (Hint: copy the code from 2.2.1. The penguins data frame and paste each in the cells below)

Show the code
# Include and execute your code here
penguins = load_penguins()
penguins
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007
... ... ... ... ... ... ... ... ...
339 Chinstrap Dream 55.8 19.8 207.0 4000.0 male 2009
340 Chinstrap Dream 43.5 18.1 202.0 3400.0 female 2009
341 Chinstrap Dream 49.6 18.2 193.0 3775.0 male 2009
342 Chinstrap Dream 50.8 19.0 210.0 4100.0 male 2009
343 Chinstrap Dream 50.2 18.7 198.0 3775.0 female 2009

344 rows × 8 columns

These tables contain all the data on species of penguins that will be displayed in the graphs.

Show the code
# Include and execute your code here
penguins.head()
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 male 2007
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 female 2007
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 female 2007
3 Adelie Torgersen NaN NaN NaN NaN NaN 2007
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 female 2007

.head() shows the first five rows of data from the first table.

QUESTION|TASK 2

Recreate the example charts from PY4DS: CH2 Data Visualization of the textbook. (Hint: copy the chart code from 2.2.3. Creating a Plot, one for each cell below)

Show the code
# Include and execute your code here
(
    ggplot(data=penguins, 
    mapping=aes(x="flipper_length_mm", y="body_mass_g"))
    + geom_point()
)

I can see the positive correlation between flipper length and body mass after running this code (a longer flipper length seems to indicate a larger body mass).

Show the code
# Include and execute your code here
(
    ggplot(
        data=penguins,
        mapping=aes(x="flipper_length_mm", y="body_mass_g", color="species"),
    )
    + geom_point()
)

This color=“species” color codes the points for the three different species of penguins

Show the code
# Include and execute your code here
(
    ggplot(
        data=penguins,
        mapping=aes(x="flipper_length_mm", y="body_mass_g", color="species"),
    )
    + geom_point()
    + geom_smooth(method="lm")
)

geom_smooth added lines for each species to better see the positive correlation

Show the code
# Include and execute your code here
(
    ggplot(data=penguins, mapping=aes(x="flipper_length_mm", y="body_mass_g"))
    + geom_point(aes(color="species", shape="species"))
    + geom_smooth(method="lm")
    + labs(
        title="Body mass and flipper length",
        subtitle="Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
        x="Flipper length (mm)",
        y="Body mass (g)",
        color="Species",
        shape="Species",
    )
)

The labs code adds all the labels for the title, subtitle, x- and y- axis, and species key

Show the code
# Include and execute your code here
(
    ggplot(penguins, aes(x="flipper_length_mm", y="body_mass_g"))
    + geom_point(aes(color="species", shape="species"))
    + facet_wrap(facets="island")
)

The facet_wrap divided the data into three charts determined by island so the chart is more readable.